DOCUMENT RESUME 



ED 403 739 

AUTHOR 

TITLE 

PUB DATE 
NOTE 

PUB TYPE 

EDRS PRICE 
DESCRIPTORS 



ABSTRACT 

mathematics tasks as an alternative method for assessing young 
Hispanic students with limited English proficiency. Students (n=45) 
from kindergarten and grades 2 and 3 were administered 14 
manipulat ives-based tasks. Each task was retested over a period of 
2-3 weeks. Task performance was scored on four scales: conceptual 
understanding; efficient strategy use; accuracy; and fluency. The 
study addressed research questions of retest reliability, task 
difficulty, and relationships among subscales, patterns of change in 
subscales, relationship to general math skills, relationship to 
language proficiency, and relationship to gender. Results indicate 
that the manipulat ives-based tasks were challenging for this 
population, despite use of manipulat ives and continuous access to 
them in the classroom. Students improved significantly on six of the 
14 tasks over the retesting period, without targeted instruction. 
Subskills were tightly interrelated at earlier ages, highly 
differentiated in grade 3, suggesting that manipulative tasks may 
function differently according to developmental level. Changes from 
first to second assessment were even across the four criteria 
subscores. Little relationship was found between math ability and 
English or Spanish fluency. Small but consistent gender difference 
was found. (Contains 75 references.) (MSE) 
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Abstract 



The purpose of this study was to examine the use of manipulatives-based mathematics 
tasks as alternative assessment methodology with young, Hispanic students with Limited 
English Proficiency (LEP). Forty-five students in Kindergarten, Grades 2 and 3 were 
administered a total of 14 manipulatives-based tasks. Each task was retested over a period 
of 2-3 weeks. Task performance was scored on four scales: (1) conceptual understanding, 
(2) efficient strategy use, (3) accuracy, and (4) fluency. This study addressed research 
questions of retest reliability, task difficulty, and relationships among subscales, patterns of 
change in subscales, relationship to general math skills, relationship to language 
proficiency, and relationship to gender. Answers to each question are presented, along 
with the caution that these results are from a small sample population, exploratory study, 
and need replication. 
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Mathematics Underachievement hv Hispanic Students 

Hispanic students in the United States are at risk for underachievement and failure 
(Laosa & Henderson, 1991; Orum, 1986; Valencia, 1991) with mathematics being 
especially problematic (Haycock & Navarro, 1988; MacCorquodale, 1988; Policy Analysis 
for California Education, 1990). Hispanic students' underachievement in mathematics is 
recognized by third grade — the time at which the children can be reliably, formally tested 
(Dossey, Mullis, Lindquist, & Chambers, 1988). Because of the generally, low 
mathematics achievement of Hispanic students, a disproportionate number are classified as 
disadvantaged and enrolled in Title I programs (Kennedy, Birman, & Demaline, 1986). 

The majority are classified as Limited English Proficient (LEP) and placed in Bilingual or 
English as a second language (ESL) programs, which currently lack sufficient published 
evidence on their effectiveness (Cziko, 1992; Lam, 1992). 

Several possible explanations have been advanced for underachievement of 
Hispanic students in mathematics. These explanations are not mutually exclusive; more 
than one is typically held by any particular researcher. One argument is that academic 
content is ineffectively taught and tested because both occur outside of Hispanic students' 
ethnographic context. The solution offered is to efficiently develop competence and 
confidence through "Ethnomathematics" (D'Ambrosio, 1987; Massey, 1989; Stigler & 
Bamaces, 1988). The Ethnomathematics point of view calls for a thorough reconstruction 
of the mathematics curriculum to achieve "cultural compatibility" (Moll & Diaz, 1987; 
Trueba, 1988). The term "curricular contextualization" has also been used to refer to 
relating mathematics instruction to the home culture (Tharp, 1989). In this reconstruction, 
students' family experiences and vocabulary are used to frame mathematics problems in the 
classroom (Henderson & Landesman, 1992). 

A second explanation for Hispanic student underachievement in mathematics is that 
assessment problems are presented in abstract form and are overly reliant on verbal/reading 
skills and on linear reasoning. Some evidence exists that LEP student achievement 
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improves with use of hands-on teaching and testing, permitting hands-on manipulation of 
three dimensional props (Garcia, 1991; Tharp, 1989). Through manipulatives, students 
can create concrete, visible connections of mathematical principles and procedures (Bohan, 
1995). Other researchers have lauded the potential of manipulatives in mathematics 
instruction, but cautioned that manipulatives are commonly misused (Carpenter, Fennema, 
Fuson, Heibert, Human, Murray, Olivier, & Weame, 1994). 

A third point of view is that LEP students lack "linguistic engagement" of their first 
language in mathematics (Henderson & Landesman, 1992). Their lack of English fluency 
prevents engagement in the highly abstract vocabulary and principles of mathematics. 
Furthermore mathematics word problem's semantic structure is an important contributor to 
the problem's difficulty level (Carpenter & Moser. 1984, 1983, 1982; Riley & Greeno. 
1988, 1983). In addition, recently educators have begun to understand that mathematics, 
itself, is a discrete language (Dale & Cuevas, 1987; 

Orr, 1987, Pimm, 1987; Reilly, 1988). Several researchers contend that full understanding 
of mathematics problems requires facility with their unique language, including logical, 
semantic, and syntactic features (Briars & Larkin, 1984; Dean & Malik, 1986; Kintsch & 
Greeno, 1985; Riley & Greeno, 1988; Riley, Greeno, & Heller. 1983). 

A fourth point of view on Hispanic students' underachievement in mathematics 
contends that poor-quality instruction typifies many classrooms with high enrollment ot 
ethnic minorities, especially Title I, Bilingual and ESL programs. This instruction is 
denigrated as focusing on basic skills, rote learning, rigid problem-solving procedures and 
lower-order thinking skills (Kennedy, Birman & Demaline, 1986; Nunes, 1992). It is 
further contended that schools with predominately Anglo enrollments are more likely to 
attend to higher order thinking skills (Simmons, 1985). Evidence is mounting that an 
emphasis on higher order thinking activities lead to achievement gains in LEP students 
(Tikunoff, 1983; Wong Fillmore, Ammon, McLaughlin, and Ammon, 1985), especially 
those of Hispanic origin (Garcia, 1991, 1988). 
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Concerted efforts such as those by the National Council of Teachers of Mathematics 
appear to be influencing math instruction in regular education, but no evidence exists on the 
impact on such special programs as bilingual, ESL, and Title I (National Council of 
Teachers of Mathematics, 1995). 

Alternative Assessment in Mathematics 

Formal assessment in both mathematics and literacy has been under sustained attack 
for over a decade, and the criticism seems to be accelerating. Wixson, Valencia, and Lipson 
(1994) recently observed that the criticism is coming from all camps, including 
psychometricians (e.g., Linn, Baker, & Dunbar, 1991; Shepherd, 1991), policy-makers 
(e.g., Darling-Hammond, 1991; Smith, 1991), test publishers (e.g., Kean, 1992), 
researchers (e.g., Calfee & Hiebert, 1991; Farr, 1992), and classroom teachers (e.g., 
Howard, 1990). Formal mathematics assessment is criticized on several grounds: for (a) 
over-emphasizing basic skills, (b) for relying on unauthentic multiple choice selection 
responses, (c) for rigidly considering only a single solution, (d) for not permitting students 
to earn partial credit, (e) for scoring only the final answer or product in exclusion of the 
problem-solving process, (0 for failure to embed math problems in a realistic cultural 
context, and (g) for testing with artificial time constraints and without intermediate feedback 
(Alexander & James, 1987; Hambleton & Swaminathan, 1985; Gentry-Norton, 1995; 
LaCelle-Peterson & Rivera, 1994; Mumane & Raizen, 1988; Schmidt, 1983; Freeman, 
Kuhs, Porter, Floden, Schmidth, & Schwille, 1983; Stone, 1989; Tindal & Marston, 

1990; United States Department of Education, 1992). 

Thus, the contemporary movement in mathematics education toward more applied 
problem-solving, higher order thinking skills, multiple problem-solution strategies, and use 
of hands-on manipulatives can offer little in the way of assessment to accompany these 
instructional changes. Yet assessment in math and other skill areas is pervasive and 
strongly impacts what lessons are planned and taught, and what teaching efforts are 
rewarded (Salmon-Cox, 1980; Stiggins, Conklin, & Bridgeford, 1986). Because math 
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assessment results are influential, one might expect a negative impact from assessment 
results which poorly reflect changes in curriculum and instruction (Gentry-Norton, 1995). 

Work with varied math testing formats, such as process assessment and open- 
ended questions is still only at an exploratory level (Romberg, Zarinnia, & Collis, 1990). 
Reports on varied assessment formats typically emphasize their informal use, without 
establishing psychometric properties of interscorer reliability, as well as criterion-related 
validity and stability of scores over time. In fact, most informal, alternative assessment 
formats, although possessing face validity, demonstrate low psychometric quality 
(Gullickson, 1982; Natriello, 1986; Stiggins & Bridgeford, 1985). Only with reliable 
assessment can judgments be reproduced and be unambiguously communicated to 
students, parents, and others (Bird, 1990; Linn, Baker & Dunbar, 1991; Valencia & 
Pearson, 1987). 

Alternative assessment formats are often accompanied by scoring methods which 
depart from the psychometrically well-behaved and well-understood multiple choice 
selection. Complex tasks may have multiple scorable dimensions, just as tasks which 
assess the problem-solving process may be scored in multiple stages. Furthermore, a 
simple additive scoring of these discrete dimensions or stages may yield deceptive results. 
In addition, different types of score scales may be required for a single test task, e.g., a 
holistic rating along with a simple objective "0" or "1" score. The reliable scaling and 
scoring of complex tasks is a continuing concern of assessment specialists, but is yet to 
yield satisfactory solutions. 

Purpose 

The purpose of this study was to investigate the use of math manipulative tasks tor 
assessing Hispanic limited English proficient students. The rationale for selecting 
manipulatives-based assessment was based on four main reasons suggested by the 
literature, as well as best practices in mathematics classrooms that include Hispanic second 
language learners and our experiences working within those contexts. First, a 
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manipulatives-based task relies less on verbal linguistic skills. Second, the manipulatives 
permit concrete representation of abstractions and hands-on problem-solving. Third, a 
manipulatives-based task is not culturally embedded in either the dominant white culture or 
the Hispanic minority culture. Fourth, through a manipulatives-based task a student is more 
likely to publically exhibit the problem-solving process, permitting it, as well as the 
product, to be assessed and evaluated. 

Recognizing that this study was exploratory in nature, we wished to examine 
different methods of scoring manipulatives-based math tasks. We also wanted to measure 
the difficulty of these tasks for young Hispanic second language learners. Finally, we 
were concerned with the traditional psychometric qualities of retest reliability and criterion- 
related validity. From this initial study, we hoped to establish guidelines tor task 
construction and scoring which could be applied in a more comprehensive study with more 
tasks and more students. 

Research Questions 

We posed the following questions of our manipulatives-based math assessment 
tasks: 

1 . Retest Reliability: Will Hispanic second language learners earn similar scores on 
manipulative-based math tasks when retested over a 3 day period? 

2. Task Difficulty: How difficult are manipulative-based math tasks for limited 
English proficient Hispanic learners in kindergarten and the primary Grades 2 & 3? 

3 . Relationships among Four Criteria Subtest Scores: What patterns of improvement 
are seen in the four criteria scores ( 1) conceptual understanding, (2) valid and 
efficient strategies, (3) accuracy, and (4) fluency intercorrelated? 

4. Patterns of Change in Criteria Subtest Scores: What patterns of improvement are 
seen in the four criteria scores (1) conceptual understanding, (2) efficient valid 
strategies, (3) accuracy, and (4) fluency over a 3-4 week interval? 
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5 . Math Manipulates Tasks and General Math Skills: How does performance in the 
four criteria scores of the manipulatives-based tasks compare to overall math skills 
as measured by a standardized math achievement test? 

6 . Math Manipulatives Tasks and Language Proficiency: How does performance on 
the four manipulative task scores compare to student proficiency in both first and 
second languages (Spanish and English)? 

7 . Math Manipulatives Tasks and Gender: How does performance on the criteria 
scores relate to student gender? 

Method 

Context 

This study occurred in an urban elementary school within a Houston, Texas, school 
district experiencing rapid growth in Hispanic enrollment. Hispanic students made up 60% 
of the enrollment of this school, and 78% of the school's enrollment qualified for free or 
reduce lunch. In this K-4 school, reading and math achievement among the Hispanic 
population was at the 18^ percentile on state norms for the state-mandated Texas 
Assessment of Academic Skills (TAAS) achievement tests. Only 9% of these students 
demonstrated the mastery level expected of students at the end of Grade 3. 

The elementary school was the site of a three-year federally funded transitional 
bilingual grant project which focused on mathematics and English competency. This grant 
funded computers, including printers, other instructional materials, manipulatives, teacher 
aides, and staff development and training time for bilingual teachers and bilingual teacher 
aides. Funds were provided for five classrooms, which served as the source for our 
student respondents. The federally-funded program employed four main components: 

(1) active learning, (2) math manipulatives, (3) computer-assisted instruction, and (4) 
parental involvement. As part of the manipulatives component, classroom teachers were 
trained in the use of math manipulatives, not only in teaching but also in learning centers, 
focusing on three systems: unix cubes, cuisennaire rods, and base ten blocks. Math 
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learning centers in all classrooms contained these materials. Fluent bilingual teachers and 
teacher aides worked with students in small and large group instruction. Although the 
manipulatives were emphasized daily, this part of the curriculum was supplementary to the 
standard district-adopted basal math program (Spanish edition). 

Respondents 

From the five special program classrooms (two kindergarten, two Grade 2, one 
Grade 3) a total of 45 student respondents were selected. Selection was strategic, based on 
the classroom math functional level of the students as determined by teacher judgment. 

Nine students from each classroom were selected, two each with high (70th-87th %iles), 
medium (39th-58th %iles), and low (12th-30th %iles) math performance for their class. 

We wished to obtain a range of abilities at each grade level to ascertain how the novel 
manipulatives tasks would function for most students. We also required that students 
selected be communicative enough to respond verbally in Spanish in a one to-one test 
situation. For each student an alternate was also selected in case of frequent absenteeism or 
moving. Because of student attrition and inclusion of alternates, we completed the study 
with 45 students (18 males, 27 females). Males and females were evenly distributed across 
general math ability levels. Table 1 shows the distribution of students by gender, and by 
general math skill level. 

Insert Table 1 about here 

Table 1 indicates that we completed the study with double the number of 
kindergarten students as those from Grade 2, or Grade 3. At each grade level, the 
distribution of the numbers of high, medium, and low achieving students were similar. 
Students' Spanish and English language competencies had been evaluated within the past 6 
months on the Idea Oral Proficiency Test (IPT). Table 2 indicates that no students were 
fluent in English (FES level), and less than half of the respondents were fluent in Spanish. 
This deficiency in both languages can be expected to influence learning in all academic 
areas. Half the students spoke not even basic survival English (NES level), which is why 
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classroom instruction was conducted completely in Spanish, with conversational English 
taught as a skill area. For all 45 students, Spanish was the home language. 

Insen Table 2 about here 

One-third of the students had been in the United States less than 3 years. While in 
Mexico and/or Central America, most of the younger students did not attend school or did 
so only occasionally. Immigrant children who were age-ready for Grades 2 or 3 often had 
only sporadic or no prior schooling in Latin America. Thus, in addition to the burden of 
lack of linguistic fluency, these students were still undergoing the process of acculturation 
to U. S. society and schools. 

Materials 

From the supplementary teachers' manuals and workbooks accompanying the 
cuisennaire rods and base ten blocks, we modified several student learning tasks to create 
tasks suitable for assessment. Suitable tasks had to meet the following six standards: 

1 . Tasks embodied important mathematical concepts and principles, 

2. Tasks were appropriate for low-age-to-average math skill levels in kindergarten 
through Grade 3, 

3 . Tasks had not been directly taught or practiced by teachers. 

4. Both the process and product of task solution was visible and appeared to be 
scorable, 

5 . Tasks appeared sufficiently challenging for the designated grade level so we could 
anticipate multiple practice occasions for each student before mastery, and 

6. Tasks could be efficiently completed with a minimum of teacher directions, and 
with relatively few standard materials. 

We began the study by developing 35 tasks; however, after three months of pilot- 
testing that number was reduced to 14 key test tasks. Through pilot testing, we identified 
inaccuracies in the initial designation of Grade level tasks. Of the 14 final tasks, 7 were 
most appropriate for kindergarten, only two for Grade 2, and five for Grade 3. 
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Had we been able to include Grade 1, several of the Kindergarten tasks would 
likely have been more appropriate there. However, because we conducted final testing in 
March/ April, several of the Kindergarten students were performing similarly to Grade 1 
students. The 14 tasks are named in Table 3, with appropriate grade level and key 
mathematical concepts, principles, and skills embedded in each task. Illustrated directions 
and materials for each task are included as Appendix I. 

Insert Table 3 about here 



Task Scoring 

Student task performance was scored according to four criteria, covering both the 
problem-solving process and the final solution. Each criterion received rating on a 1-5 
ordinal scale. The first three criteria used the same scale, but the fourth criterion required a 
different scale. Following are the four criteria and their scales: (a) conceptual 
understanding, (b) use of valid, effective strategies, and (c) accuracy of task completion — 
[Scale: 1 = none or almost none, 2 = only a little, 3 = some, partially, 4 = most, mainly, 

5 = fully, complete]; and (d) fluency and speed of task completion — [Scale: 1 = very slow 

6 disfluent, 2 = somewhat slow & disfluent, 3 = adequate/average speed & fluency, 

4 = good speed & fluency, 5 = very good/excellent speed & fluency.] 

Because of the multiple scoring criteria, all students were required to attempt the full 
task. For this reason, if students became stuck, they were given minimal prompts, in the 
form of questions, to help them focus on relevant task features. All such examiner prompts 
were recorded on the score sheet, and served to negatively weight the final, ratings. 
Procedure 

For three months, two fluent bilingual graduate students in a Master of Education 
Program (M.Ed.), piloted the manipulatives-based tasks with 15 Hispanic second English 
language learners in a different elementary school in a different school district. This pilot 
group mirrored the main sample in that it contained similar numbers of high, medium, and 
low achieving students, as nominated by teachers. During the pilot phase we improved the 
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instructions, task structure, and scoring procedures. Also during this pilot phase, inter-rater 
reliability was assessed over a three-day period with the 15 students, using the final 
versions of the tasks and score sheets. For retest reliability, each student was assessed 
twice with four different tasks which were selected with the assistance of the classroom 
teacher to be challenging for that student. 

Following the pilot-testing and revisions, the test-tasks were presented to the 
selected 45 respondents. All directions were in Spanish. Three-to-four challenging tasks 
were chosen for each student, depending upon the speed with which students were able to 
work. The three or four tasks were presented to each student individually within a single 
session. Administration of each task required approximately 3-5 minutes. After a 2-to-3 
week break, equivalent versions of the same tasks were re-administered to all students. 

Results 

Retest Reliability 

Retest reliability was assayed during the pilot phase of the study with a small group 
of 15 students. Our expectations for retest reliability were different for these 
manipulatives-based tasks than they would have been for standard test items. To the extent 
that the manipulatives-based assessment was "dynamic" ; i.e., the examiner provided 
prompts to the student, students might be expected to learn from the examination. This 
learning phenomenon might occur even though the student did not receive feedback on the 
correctness of his/her final response. In fact, a professed advantage of dynamic 
assessment is the students' ability to learn while being assessed. (Lidz,1987). To the extent 
that our students might learn differentially from examiner prompting, we expected lower 
retest reliability. Retest reliability results are presented in Table 4. This table indicates that 
reliability coefficients obtained from simple rank order (Contingency Coefficient) were 
moderately strong (.69 - .77). However, when corrected for scaling irregularities 
(Cramer's V) the coefficients were lower (.48 - .71). Finally, when the level of agreement 
was conservatively corrected for all chance agreement through Cohen's Kappa (Cohen, 
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1960) the resulting coefficients varied from strong to weak (.74 - .32). Kappa values 
between .40 and .75 represent fair-to-good agreement beyond chance (Fleiss, 1989). Table 
4 indicates that the retest reliability varied depending on grade level and/or on which test- 
tasks were used (these two variables were confounded). The lower two grade levels 
(Kindergarten, Grade 2) produced weaker retest reliability. 

Insert Table 4 about here 



Task Difficulty 

The second research question asked how difficult these mampulatives-based tasks 
were for low-achieving LEP students. The published literature fails to offer such basic 
information, although manipulatives-based tasks are commonly recommended. We 
examined task difficulty separately for the first and second assessments, spaced 3-4 weeks 
apart. Here, our index of task difficulty was a simple average of all four criteria 
(conceptual understanding; use of valid and effective strategies; accuracy of task 
completion; and fluency & speed of task completion). Results are presented in Table 5. 

Insert Table 5 about here 

Manipulatives-based math tasks proved challenging for the students. For the first 
assessment, overall task difficulty was similar for the three grade levels, closer to a middle 
rating of 3 (some, partially) than a rating of 4 (most, mainly). (Kindergarten = 3.04; 

Grade 2 = 3.30 and Grade 3 = 3.25). Students found the tasks somewhat less difficult 
during the second assessment. (Kindergarten = 3.37, Grade 2 = 3.73, and Grade 
3 = 3.62). 

Of the 14 tasks, students' average performance improved on twelve, and 
deteriorated on two. Of the twelve improved tasks, the skill improvement was statistically 
significant for only six. The effect sizes of these six gains ranged from small (one third of 
a standard deviation) to large (nearly a full standard deviation). An effect size of .40 to .50 
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or larger is considered large enough for practical significance in most educational 
achievement (Hedges, & 01kin,1985; Wolf, 1986). 

The significantly improved tasks were not from one single Grade level. The two 
deteriorated tasks were from Kindergarten and Grade 3. Task improvement could not be 
predicted from the initial performance level. 

Relationship among Four Subtest Scores. 

More than one judgment scale are required to score both a student's answer or 
product and the problem-solving process he or she used. Although multi-scale scoring is 
commonly recommended, published literature on alternative assessment fails to describe 
relationships among separate scales. 

Intercorrelations among the performance ratings of the four criteria were examined 
(conceptual understanding, use of valid, & effective strategies, accuracy of task 
completion, and fluency & speed of task completion). Prior to analysis, all scores were 
standardized within tasks, to control for differences in difficulty across tasks. Hierarchical 
cluster analyses of intercorrelation matrices were then conducted both within and across 
tasks. The cluster trees in Figure 1 show similar results for different tasks within a grade 
level, but quite different cluster patterns from one grade level to the next. 

Insert Figure 1 about here 



At Kindergarten level, the four criteria skill ratings were all strongly 
intercorrelated. The two most cohesive or "tightest" clusters were : (a) fluency + efficient 
strategies, and (b) conceptual understanding + accuracy. At Grade 2, the four skills were 
moderately to strongly intercorrelated. The Grade 2 clusters differ from those for 
kindergartners. In Grade 2, fluency + accuracy formed the tightest cluster, followed by 
Conceptual understanding + valid and efficient strategies. At Grade 3 only one very tight 
cluster was noted, and it was fluency + accuracy ( r = .96). The other two criteria were 
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only moderately correlated. In general, there appears to be greater differentiation of 
component skills as one proceeds through the primary Grade levels. In addition, as one 
proceeds up the grade levels, the relationship between Fluency and Accuracy becomes 
stronger, i. e., students who solved tasks rapidly and fluently are also more likely to obtain 
the correct answer. 

Patterns of Change in Criteria Subtest Scores 

We earlier examined the overall improvements of average raungs from first to 
second assessment. The current question examines patterns of improvements among the 
four criteria subtest scores. Do ratings of the four criteria scores change together, or do 
improvements in one criteria tend to lead the others? The obtained patterns are described 
graphically in Figure 2. Figure 2 depicts for each grade level the criterion score changes 
between first and second testings. Error bars mark 85% confidence intervals, based on 
standard error of the mean. Score patterns at the three Grade levels bear similarities and 
differences. At all Grade levels students were rated highest for conceptual understanding. 
At Kindergarten, fluency was rated next higher, and efficient strategies lowest. At Grade 
2, valid and efficient strategies, accuracy, and fluency were rated similarly. At Grade 3, 
efficient strategies was rated second highest, followed by accuracy and fluency. 

Insert Figure 2 about here 

Patterns of improvement from first to second testing appear relatively consistent 
across skills and grade levels. A minor exception is the relatively greater mean gain in 
accuracy and fluency by Grade 3 students. However, these mean gains must be interpreted 
in light of more highly dispersed score distributions (larger standard error of the mean). 
Suhskill Performance by General Math Skills 

For a new assessment tool to demonstrate criterion-related validity, it must relate 
well to established measures of a similar nature. However, when the target and criterion 
tests require substantially different types of performance, smaller validity coefficients 
generally result. To relate general math levels to manipulatives-based task scores, we first 
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standardized the scores within each grade level to control for between-grade differences in 
task difficulty. Then all grade levels were combined for analysis. We conducted a mixed 
(repeated measures & factorial analyses) ANOVA, with the criteria scores serving as 
repeated measures (with four levels) and general math level as the categorical grouping 
variable with three levels (High =15, Medium = 16, Low = 14). The ANOVA resulted in a 
large main effect for general math skill level, [F(2 df) = 6.209, (p = .0043], The main 
effect for criteria score type was not significant, [F(3 df) < 1, (p = -99], No significant 
interaction was noted between these two variables, [F(6 df) = .294, {p = -93)]. Figure 3 
presents the interaction graph for this analysis. Figure 3 also demonstrates that the 
manipulatives subtest scores neatly separate low general math skill levels from medium and 
high levels. Interestingly, students at the medium skill level were the top performers in 
using efficient strategies, although a post-hoc analysis proved this difference between 
medium and high groups to be too small for statistical significance. Figure 3 also 
demonstrates relatively flat performance across the four criteria subtest ratings at each 
Grade level, especially for students in the low achieving group. 

Insert Figure 3 about here 

Suhtest Performance by Language Proficiency 

Manipulatives-based tasks are recommended for LEP students in part because the 
tasks appear to be less verbally loaded, and thus permit demonstration of non-verbal math 
skills. The students in this study, scored low on both English and Spanish abilities; 
however, a range of language skills did exist We, therefore, were able to conduct a mixed 
ANOVA, with the criteria scores as repeated measures and language skills as the categorical 
grouping variable (five levels). The five language skill groups were 1. FSS/LES (n = 5), 2. 
FSS/NES ( n = 13), 3. LSS/LES (n = 12), 4. LSS/NES ( n = 11), 5. NSS/NES (n = 4). 

Interestingly, the analysis showed no main effect for language, [F(4 df) = 1.12, 
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(j2 = .36)]. Nor did a main effect exist for subtest scores, [F(3 df) = .0001, (g = .99)]. 
However, a significant interaction did exist; different criteria score patterns were revealed 
for different language groups, F(12 df) = 3.003, (p = .001)]. Figure 4 presents the 
interaction graph for this analysis. 

Insert Figure 4 about here 

All manipulatives-based tasks were presented in students' tirst language, Spanish. 
We expected that because manipulatives-based tasks were presented in Spanish, a student s 
level of Spanish proficiency would predict math-task performance. However, Figure 4 
shows clearly that Spanish fluency alone relates only randomly to any one of the four 
criteria scores. We further expected that because English is the main academic language, 
even in Texas's highly bilingual districts, fluency in English would also predict math-task 
performance. However, Figure 4 also shows that a simple English proficiency level fails to 
predict any criterion score except for Fluency. For the Fluency rating, students with limited 
English skills (LES) outperformed students with no English skills (NES). 

Because of the lack of predictive power of simple English or Spanish proficiency, 
we next examined patterns of English & Spanish combined proficiency. Here, again. 
Figure 4 depicts counterintuitive results. On the manipulatives-based tasks, the lowest 
overall performers were not those students with lowest combined language skills. In fact, 
the overall lowest performing students were fluent in Spanish, with no English skills 
(FSS/NES), and the highest performing students (by a slim margin) were those without 
fluency in either language (NSS/NES). This result was so surprising that we individually 
examined four individual NSS/NES student cases from our database to help explain this 
phenomenon. Of these four students, one tested with high math skills, one medium, and 
two low. All four of these students were females, two in Grade 2 and two in Grade 3. 

Only one student was a recent arrival to the US. (1.75 years). Two of the four had been in 
the US approximately 3.5 years, and one student was bom here. Figure 4 also shows that 
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these four NSS/NES students exhibited a great range in subscores, with efficient strategies 
the highest among any group, and fluency only average. 

Two groups of students performed nearly identically: FSS/LES and LSS/LES. 
which seems to indicate the relative unimportance of Spanish fluency on these academic 
tasks, given at least limited English proficiency. 

Subtest Performance bv Gender 

The field of math education has a considerable body of knowledge on the 
underperformance of female students (Hanson, 1992; Olszeweski-Kubilius, 1990). Among 
the Hispanic population, this underachievement is especially pronounced, possibly because 
the culture has tended to emphasize female submissiveness (Heller, 1966; Madsen. 1961). 
Furthermore, some evidence exists that whereas females exhibit relatively stronger verbal 
skills in solving academic tasks, males exhibit stronger problem-solving skills on three- 
dimensional, engineering-type tasks. (Sadker & Sadker, 1994) Based on these 
generalizations, we predicted that the manipulative-based tasks would favor the male 
students. 

The student respondents included 27 females and 18 males. Gender was 
distributed nearly identically across general math skill levels (see Table 1), so a gender- 
based comparison on manipulatives-based tasks could be made without concern that the 
general math skill level would conlound the comparison. We conducted a mixed ANOVA, 
with the subtest scores as repeated measures and gender as the categorical grouping 
variable. The ANOVA resulted in no main effect for gender, [F(l df) = .80. (jl= .37)]. 
Neither was there a main effect for type of criteria subtest score, [F(3 df) = .05, (p = .98)], 
nor was the interaction significant, [F(3 df) = .07, (ji = .97)]. The interaction graph from 
this ANOVA is presented in Figure 5. It indicates that average subtest scores for males 
were consistently higher than for females, but not significantly so. In every case, the 85% 
confidence interval bars overlap with mean scores. We concluded that there was a 
consistent, but non-significant gender effect on math manipulatives-based test scores. 
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Insert Figure 5 about here 
Discussion 

This study was an exploratory study of the use of manipulatives-based mathematics 
assessment with 45 Hispanic limited English proficient students in Kindergarten, Second 
and Third Grades. According to published literature, such manipulatives-based tasks 
potentially offer several benefits: (a) abstract concepts can be demonstrated in concrete 
terms, (b) multi-modalities (including tactile and kinesthetic) are applied in problem- 
solving, (c) the problem-solving process as well as the final answer are scored, and (d) 
students without solid language proficiency have a better chance at success. Despite these 
claims, little empirical evidence exists on manipulatives-based assessment with Hispanic 
LEP students. We were especially interested in the psychometric behavior of these tasks 
-their reliability and validity, and how Hispanic LEP students performed. Although the 
student sample was small, it was strategically selected to permit several revealing analyses. 

Constructing suitable, scorable tasks required considerable pilot-testing which 
yielded only 14 usable tasks from an initial set of 35. Thus the conversion from teachable 
task to testable task is not automatic. In addition, several initial tasks appeared upon closer 
scrutiny to be superficial; i. e., they did not reflect important, relevant math concepts or 
processes worthy of assessment. Moderately strong retest reliability was obtained on 12 of 
these tasks over a 3-day retest period, although somewhat lower reliability was obtained for 
Kindergarten students than for older students. The obtained reliability was better than we 
had anticipated, given the different levels and types of cues needed by our examiners for 
students to complete the tasks. Perhaps our greatest scoring dilemma continues to be the 
reliable treatment of idiosyncratic cues, which must occur in this type of dynamic 
assessment. 

The manipulatives-based tasks proved challenging for Hispanic LEP students, 
despite the fact that they had practiced using the manipulatives in the classroom and had 
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continuous access to them. The students improved significandy on six of the 14 tasks over 
a retesung period of 2-3 weeks, without targeted instruction. This improvement could have 
been due to the occasional prompts received by students during the first testing or to 
student-initiated discovery back in the classroom. 

The research question on interrelationships among the four criteria scores seemed to 
favor a developmental hypothesis. The sub skills changed from being tightly interrelated in 
highly differentiated in Grade 3. In fact, the combination of Fluency and Accuracy became 
more tightly linked, and approached unity (r = .96) in Grade 3. These findings caution 
against cross-Grade pronouncements about how students react to or use math 
manipulatives. In Grade 3, Fluency implied Accuracy, to the extent that the separate scores 
were largely redundant. However, in Kindergarten. Fluency was less likely to imply 
Accuracy. Thus, manipulatives tasks may function differently at different developmental 
stages, and future research should be sensitive to that probability. 

We were also interested in how sub skills changed - together or unevenly, with 
one sub skill leading the others. Two assessment periods did not provide an ideal 
opportunity for profiling change over time; however, we were able to draw some gross 
conclusions. If skills change unevenly, perhaps one subskill could serve as diagnostically 
useful advance notice of later learning. An orderly acquisition of subskills might also 
provide suggestions for an instructional sequence or changing focus. However, our 
findings were not supportive. Changes from first to second assessment were relatively 
even across the four criteria subscores. Only some differentiation appeared in Grade 3, 
where relatively greater gains were made in Fluency and Accuracy compared to Efficient 
Strategies and Conceptual Understanding. This phenomenon might be described as 
students learning how to quickly arrive at the solution without concomitantly showing 
greater understanding. In that case, students might be treating the manipulatives as artificial 
tools rather than as conceptually meaningful as intended by the examiners. On the question 
of criterion-referenced validity of the manipulatives-based math tasks, the evidence was 
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only mildly supportive. The criterion measure was a traditional, mainly multiple-choice, 
broad assessment of math skills which had been district-normed. However, the 
manipulatives-based tasks were not able to differentiate medium (39th - 58th %iles) from 
high (70th - 87th %iles) were clearly differentiated from medium and high scores, may 
suggest that manipulatives-based tasks are most sensitive to the abilities of low-achieving 
students and are best used with them. 

The lack of relationship between either English or Spanish language fluency and 
math ability was found to be interesting and confirmatory of those researchers who claim 
an advantage of manipulatives to be that they are not as verbally dependent as other tasks. 
On the other hand, these results might be considered disconfirmatory of those who claim 
lack of academic achievement to be largely due to the lack of language fluency. However 
interpreted, students without fluency in either English or Spanish scored highest or nearly 
highest in three of the four criteria subskills. Further investigation showed that all of these 
atypical students were females, and that all had been in the US for more than 1.75 years. 
Although tasks were administered in Spanish, it mattered little to scores whether students 
were fluent in Spanish or possessed only limited Spanish skills. 

The examination of the role of gender in manipulatives-task performance revealed 
small and insignificant, but consistent results. Although boys and girls showed virtually 
identical average performance on the district's criterion-referenced math test, the 
manipulatives-based test slightly favored boys, across all subtests. We had hoped not to 
find a gender bias built into the manipulatives-based tasks, but had noted that their 
manipulation may reflect the construction play which is more often modeled for boys, and 
which boys are therefore more likely to practice. Manipulatives also require a modicum of 
physical forwardness or initiative in reaching out to take and manipulate. Some young girls 
observed were very hesitant and shy about so doing. Less shyness may have been present 
had the classroom teachers or aides, themselves, conducted the assessments. On the other 
hand, the examiners were frequent visitors to these classrooms. Furthermore, 
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confirmatory remarks by teachers indicated that the performances of the students tended to 
be typical. 

An initial investigation into the behavior of novel manipulatives-based math tasks, 
the current study raised more questions than possible answers. Even in the case of clear- 
cut results, this study's number of participants (N), was too small to draw firm and valid 
conclusions. Yet, some commonly held assumptions were not confirmed by this study: (a) 
students' language skills were largely irrelevant to task performance, (b) the tasks were not 
well validated (by an external measure) for medium and higher achieving students, (c) 
criteria subskills of conceptual understanding, efficient use of strategies, accuracy, and 
fluency all appeared to improve in unison, rather than in a logical sequence, (d) these same 
criteria subskills did not covary the same way across the three Grade levels. For example, 
the fluency subscale may be measuring something very different at Grade 3 versus 
Kindergarten. 

Given the fact that we were able to obtain moderate interrater reliability in scoring a 
number of manipulatives-based tasks, and could efficiently administer these tasks to the 
sample of Hispanic LEP students, indicates that these types of tasks are amenable to 
experimental study. We are concerned about the vagaries of small samples, and seek and 
recommend a replication study with a separate and larger sample. As an exploratory study, 
it has served its purpose, raising several issues with classroom' implications. However, at 
this point, to discuss implications in detail would be overstating the case; first, replication is 
needed. 
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Table 1 



Students Respondents, with Gender and Achievement Levels on District- 
standardized Criterion Referenced Math Test- 



Gen. Skills* * ' Kindergarten Grade 2 Grade 3 
High 9 3 3 

Medium 8 4 4 

Low 9 3 2 

26 10 9 



Totals & Gender 

15 (9 F, 5 M) 

16 (9 F, 7 M) 

14 (9 F, 6 M) 

45 (27 F, 18 M) 



^General math skills from school district CRT standardized math test. 

*High= (70th-87th %iles), Medium= (39th-58th %iles), Low= (12th-30th 
%iles) 



Table 2 

Spanish and English Proficiencies of Respondents. From Recent IPT 
Testing 



Spanish / English* 


Kindergarten 


Grade 2 


Grade 3 


Totals 


FSS / LES 


4 


0 


1 


5 


FSS / NES 


6 


2 


5 


13 


LSS / LES 


8 


4 


0 


12 


LSS /NES 


8 


2 


1 


11 


NSS/NES 


0 


2 


2 


4 




26 


10 


9 


45 



*NSS = Non Spanish Speaker, LSS = Limited Spanish Speaker, 
FSS = Fluent Spanish Speaker, NES = Non English Speaker, 
LES = Limited English Speaker, FES = Fluent English Speaker 



Table 3 



Fourteen Manipulative Test-Tasks, with Grade Level and 
Kev Mathematical Concepts. Principles, Skills & Computation 



TASK 


CONCEPTS & PRINCIPLES 


SKILLS & COMPUTATION 


KINDERGARTEN 






1. Making Stair- 
steps. 


•ordinal relationship 
•increments of one 
•fixed volume increments 


•visual scanning for 
approximate rod length 
•compare & contrast rods 
•sort by size 


2. Equal trains. 


•equivalence in rod combinations 

•addition of two quantities to make a new whole 


•addition 

•selecting rods to fit missing 
part of whole 


3. Find the missing 
rod. 


•missing part to make a whole 

•increase in quantity and/or difference in quantities 


•addition or subtraction 
•visual scanning for 
approximate rod length 
•measuring rods 


4. Which is more? 


•increase in quantity to make a new whole 
(equivalence) 

•"more" related to rod length, 
•combinations form a whole 
•number sense - quantity differences 


•measuring rod combinations 
•compare/contrast rods 


5. Finding "one- 
bigger". 


•increment of one and/or addition of one 
•counting related to increment of one 


•rod length-name association, 
•measuring rods 
•addition 


6. Find the length 
of a train. 


•relation between rod combinations & continuous 
counting 

•one-to-one relationship 

•increase in quantity to make a new whole 


•selecting rods by name 
•use of numberline to find 
total length 
•addition 

counting (numeration) 


7. Half the size. 


•"half the size" 

•"doubling" to make a whole 
•equivalent fractions 

•addition of two quantities to make a new whole 


•identifying halves of even 
numbers 2-10 
•addition or dividision 


GRADE TWO 






8. Missing 
addends. 


•concept of missing addend as missing rod to total 
•tying concrete model to abstract equation 
•addition and/or subtraction of quantity needed to 
form equal whole 


•solving missing addend 
equation 

•addition and/or subtraction 
•scanning for quantity needed 
to complete whole 



Table 3 (continued) 



Fourteen Manipulative Test-Tasks, with Grade Level and 
Key Mathematical Concepts. Principles. Skills & Co mputation 



9. Ten more than. 


•concept of "ten more than" as increment to tens 
place 

•increase base ten values by tens 


•ability to read tens and ones 
rods as two-digit number 
•addition 
•counting 


GRADE THREE 






10. Writing Three- 
Digit Numbers. 


•relation between numeral place and particular size 
rods 

•concrete representation of digital quantity 
•base ten place value (hundreds, tens, ones) 


•writing three-digit number 
from place value model 
•making correct value 
substitution 


11. Simple Two- 
Digit Addition. 


•relation between rod size and place value 
•base ten place value 

•add to quantities to form new whole (base ten) 
•tying abstract and concrete: column addition as 
combining similar rods 


•adding two-digit numbers in 
column addition 


12. Two-Digit 
Addition with 
Regrouping. 


•relation between regrouping in addition and place 
value 

•Base ten place value 

•Add to quantities to form new whole (base ten) 
•Tying concrete and abstract representation of 
increase in quantity 
•regrouping to represent correct value 


•two-digit addition in 
columns, with regrouping 
•substitutions of units for 
tens 


13. Trading Up. 


•base ten place value substitutions: equivalence ot 
ten units to one 10 & ten 10s to one hundred 
•concrete representation of quantity using base ten 
values 


•making value substitution: 
regroup from ones to tens 
and tens to hundreds 


14. Two-Digit 
Subtraction with 
Regrouping. 


•base ten place value 

•subtract to quantities to form new whole (base 
ten) 

•regrouping to represent correct value 
•relation of backward regrouping in subtraction to 
place value 


•two-digit column 
subtraction numbers with 
borrowing 

•substitutions of tens for 
units 
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Table 4 



Retest Reliability of Twelve Manipulatives-based Tasks with Young 



Hispanic ESL Students (^=15). 


Grade 


N 


CC* 


Cramer’s V 


Cohen’s Kappa 


Kindergarten 


6 


.69 


.48 


.40 


Grade 2 


5 


.74 


.63 


.32 


Grade 3 


4 


.77 


.71 


.74 



♦Contingency Coefficient 



Table 5 

Mean Ratings and Improvement for 14 Manipulatives-Based Tasks Over a 
3-4 Week Interval 



TASK 


1st M 


SD 


2nd M 


SD 


12* 


ES** 


KINDERGARTEN 














1 . Making Stair-steps. 


3.76 


0.94 


3.60 


1.21 


NS 


-.17 


2. Equal trains. 


3.69 


1.17 


3.88 


1.10 


NS 


.16 


3. Find the missing rod. 


3.53 


1.22 


3.88 


0.74 


.07 


.29 


4. Which is more? 


3.45 


1.21 


3.84 


0.82 


.03 


.33 


5. Finding ‘One bigger’ 


3.44 


1.20 


3.48 


1.25 


NS 


.04 


6. Find length of a train. 


2.14 


1.20 


3.10 


1.16 


.00 


oo 

O 


7. Half the size. 


1.25 


1.01 


1.84 


1.16 


.06 


.59 


GRADE TWO 














8. Missing addends. 


3.78 


0.46 


4.17 


0.18 


.01 


.85 


9. Ten more than. 


2.83 


1.46 


3.29 


1.42 


NS 


.32 


GRADE THREE 














10. Write 3-digit numbers. 


3.85 


1.14 


4.26 


0.36 


NS 


.36 


1 1. Simple 2-digit addition. 


3.41 


1.36 


4.06 


0.35 


NS 


.47 


12. Two-digit addition with 


3.33 


0.94 


3.63 


0.78 


NS 


.31 


regrouping. 














13. Trading up. 


2.93 


0.83 


3.52 


0.65 


.06 


.71 


14. Two-digit subtraction 














with regrouping 


2.75 


0.93 


2.64 


1.38 


NS 


-.12 



Note. Tabled numerals are means for 1-5 ratings, combinations of the four 
subtest scores: (a) conceptual understanding; (b) use of valid, 
effective strategies; (c) accuracy of task completion; (d) fluency & 
Speed of task completion. 

*p value based on paired t-tests. 

**E.S. = Effect Size, standardized mean difference. 



Figure 1 . 



Cluster trees for the interrelationship of four task ratings across grade 
level tasks for Kindergarten, Grade 2, and Grade 3. 



Kindergarten Component Intercorrelations 



26 Students 
Concept. Und. 

Accuracy 

Fluency 

Eff. Strategies 



7 Tasks 



-.91 

. 93 - 




Grade 2 Component Intercorrelations 



10 Students 2 Tasks 
Concept. Und. 



Eff. Strategies 

Fluency — 

< .91 

Accuracy — 




Grade 3 Component Intercorrelations 
9 Students 5 Tasks 



Concept. Und. 

Fluency - 

< .96 

Accuracy — 

Eff. Strategies 




.58 



*Note: The numerals on the cluster trees are Pearson correlation 
coefficients. 

* Concept. Und.= Conceptual understanding; Accuracy = 

Accuracy of task completion; Fluency = Fluency & speed of 
task completion; Eff. Strategies = Use of valid, effective 
strategies. 



Rating Criterion Scores (1-5) 



Figure 2 . Patterns of Component Skills for First and 
Second Assessments at Each Grade Level. 



Concept. Efficient 

Under. Strategies Accuracy Fluency 




Concept. Efficient Accuracy Fluency 
Under. Strategies 

Kindergarten — o- Grade 2 — a- Grade 3 

Black= First Testing White= Second Testing 



Figure 3 . Relationship between Subscores on Manipulatives- 
based Test* and General Math Skills. 




Under. Strategies 

*High (n=15): 87th-70th %iles. 

Medium (n=16): 58th-39th %iles. 

Low(n=14): 30th- 1 2th %iles. 

Note: Error bars mark 85% Confidence Intervals 



Figure 4 . Prediction of component scores on manipulatives- 
based test based on five English and Spanish Skill groupings* 




Concept. Efficient Accuracy Fluency 
Under. Strategies 

*FSS/LES (n=5): Fluent Spanish/Limited English 
FSS/NES (n=13): Fluent Spanish/No English 
LSS/LES (n=12): Limited Spanish/Limited English 
LSS/NES (n=l 1): Limited Spanish/No English 
NSS/NES (n=4): No Spanish/No English 

Note: Error bars mark 85% Confidence Intervals for 
mean scores. Error Bars are provided only for highest 
and lowest scores, to reduce confusion. 



FSS / LES 
FSS / NES 
LSS / LES 
LSS / NES 
NSS/NES 



Figure 5 . Prediction of component scores based upon gender. 




Concept. Efficient Accuracy Fluency 
Under. Strategies 



— Males 



Females 



Note: Error bars mark 85% confidence intervals around mean scores. 



Appendix I - page 1 



Sample Portfolio Assessment Tasks 
for K-3 Mathematics 



1. Making Stair-steps. [K] 

Here are >.| starte( j jt f or you." 

seven rods. , 

I want to line 
them up 
evenly on 
the bottom, 
so they 
make stair- 
steps. 



EE 


i 








— 


"Now you 










finish it." 












□0 


u> 




LA 


ON 





2. Equal Trains. [K] 

"Here I have made a train out of the two rod and 
the five rod. Now you make a different train the 
same size, but you have to use different rods." 

□mum 



n. 



HE] 



Possible 



I I I |4j | 1 3 1 Solutions 



3. Find the missing rod. [K] 

"Here is the 10 rod. I want to make a train with two 
cars which is just as big as the 10 rod. I started it 
for you; you finish it." 



TT 



L XI -L LI I I M 



Answer: 4 rod 



4. Which is more? [K] 

"Make a train out of the 4 rod and the 5 rod. 
Now make a different train out of the 3 rod and 
the 7 rod. Show me which train has more. " 



4 


iii an 






IT LiLLU 


9 VI 1 W V W W f • 

H3T1 



3 & 7 



O 

ERIC 



5. Finding "one bigger". [K] 

"This is the 5 rod. I want to find 
the rod that is just one bigger, and 
put it beside this 5 rod. Then say 
its name, i — i — i — i — m 



Solution: 



1 



7. Half the size. [K] 

"Find the rod which is exactly half the size of 
this 8 rod. Show me how you know that it is 
half." 

MM TUB 



I I I l 4 l I I 1 4 1 Solution 

9. Ten More Than. [Gr. 2] 

This number is 14. Add ten more, and tell 
me what new number you have made. 









□ 




□ 


- 


□ 




□ 


14 



Solution: t 



□ 

□ 

□ 

□ 



6. Find the length of a train. [K] 

"Make a train out of the 6 rod and the 2 
rod. Put the train on the numberline. Then 
tell me how long the train is." 

|1 |2 |3 |4 |5 1 6 | 7 1 8 |9 HOfll tl2fr3|U|15|l6|l7|l8|1S|2d 

□XmH |?Hmh^M|i4|is|i6|i7|ia|ia|^ 



Answer: m 



8. Missing addends. [Gr. 2] 

"Show the missing rod that will sum to make 7. 
Then complete and read the equation." 

2 + _ = 0 



H I 1 i I'TH 



rra 



3 



2 + a =0 



10. Writing Three-Digit Numbers. [Gr. 3] 



Write this 
number: 









1 1 1 1 1 1 
















ill 
































III 



























□ 

□ 

□ 



24 



39 



Solution: 113 



Appendix I - page 2 



1 1 . Simple Two-Digit Addition. [Gr. 3] 

Solve the following problem by combining 
the rods. Then write and read your answer. 



mi 1 1 1 1 m 

■ 

1 1 i i i i i rm 



3 5 DDDnn 



12. Two-Digit Addition with Regrouping. [Gr. 3] 

Solve this addition problem by combining and 
trading the rods. Then write and read your answer. 



1 1 1 1 1 1 1 1 1 1 1 
1 1 1 1 1 1 1 1 m 
n 1 1 1 1 1 1 1 1 1 



3 q □□□□□ 
C7 □□□□ 



1 1 1 1 1 1 1 1 1 1 1 
1 1 1 1 1 1 1 1 in . 
1 1 1 1 1 am A 



+ 1 1 1 1 1 1 1 1 1 n 



□ □ 



Solution: 



XfflMW 7 

M'M’ i ' 

it vi vi 'i vm 



□ □□□ 
□ □ □ 



+ 1 1 1 1 1 1 1 1 1 n ^ 



9 



□ □□□□ 
□ □□□ 



Solution: 




13. Trading Up. [Gr. 3] 

Trade up from ones to tens and from tens to 
hundreds. Then write the number. 




14. Two-Digit Subtraction with Regrouping. 
[Gr. 3] 

Solve this subtraction problem by combining 
and trading the ones and tens rods. Then 





Solution: 

342 



□ □ 

Ones 



1 1 1 1 1 1 1 1 1 1 1 ^ ^ 



Solution: ^ miLUI J 



□ □□□ 



□ □□□ 
□ □□ 
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